Cohort bias in predictive risk assessments of future criminal justice system involvement

Significance Social science research and policy increasingly rely on predictive risk assessment instruments (RAIs), including those using machine-learning algorithms. This paper shows that the relationships between risk factors and future arrest are unstable over time when measured across sequential birth cohorts. As a result, prediction models that rely on risk factors are prone to systematic and substantial error. Such cohort bias, arising from the dynamics of social change, requires algorithmic updating and accounting for social factors affecting entire cohorts. Cohort bias can generate inequality in criminal justice contacts distinct from racial bias and has implications not only for the tailoring of RAIs but also for efforts aiming to provide preventative interventions to high-risk groups targeted based on individual-level risk factors alone.


SI-2
. Features included in predictive models. The second column lists the features available with levels for categorical variables listed in parentheses. The third column indicates which features were used in the classic riskfactor models. All features listed are included in the full models. The fourth column indicates the average ages when these features were measured. Measures of predictive features span from early childhood through late adolescence. Because our quantities of interest are predictions of arrest among the cohort study members, and to be consistent with how risk assessment tools are typically used, features and analyses of these measures are based on unweighted data. Results using sampling design and attrition weights nonetheless converge and yield similar conclusions.

SI-4
Fig. S1. Calibration plots by model type, feature set, and cohort. The rows show the calibration plots for different model types with the first through fourth rows representing logistic regression, lasso logistic regression, ridge logistic regression, and random forest models, respectively. All models were trained on the older cohort. The first and third columns show the older cohort calibration of these models trained using the classic risk-factor feature set and the full feature set, respectively. These columns show that models trained on the older cohort are well-calibrated to that cohort. The second and fourth columns show the younger cohort calibration of these models using the classic riskfactor feature set and the full feature set, respectively. Regardless of the combination of algorithm or feature set, all models demonstrate cohort bias when applied to the younger cohort.

SI-5
Fig. S2. Calibration plots for full lasso logistic regression model by race. The left column shows the calibration of a full lasso logistic regression model, trained on the older cohort, to that cohort broken out by race/ethnicity. This model is well-calibrated to the older cohort across racial/ethnic groups. The right column shows the calibration of the same model when used to predict for the younger cohort. Here we observe cohort bias for all racial groups, indicating that cohort bias is distinct from racial bias.

SI-6
Fig. S3. Calibration plots for models with extended criminal history predictors and a smaller prediction window. The first row shows calibration plots for a logistic regression model trained using the classic risk-factor feature set, and the second row shows calibration plots for a lasso logistic regression model using the full feature set. Both models were trained on the older cohort and include arrest history from age 17-21 as a predictor for arrest between ages 22 and 24. The first column shows the calibration of these two models to the older cohort, and the second column shows the calibration of these two models to the younger cohort. Cohort bias is attenuated somewhat relative to earlier formulations, but it is still present in both models.

SI-7
Fig. S4. Calibration plots for models with juvenile criminal history as a predictor. The first row shows calibration plots for a logistic regression model trained using the classic risk-factor feature set, and the second row shows calibration plots for a lasso logistic regression model using the full feature set. Both models were trained on the 9-year-old cohort, the subset of the older cohort for which we have full juvenile arrest records. Arrest history from age 10-16 is used as a predictor for arrest between ages 17 and 24. The first column shows the calibration of these two models to the older cohort, and the second column shows the calibration of these two models to the younger cohort. The addition of juvenile arrest history does not have a substantive impact on the observed cohort bias. The observed calibration slopes for the younger cohort, 0.54 and 0.62, are nearly identical to those observed in the original formulation which did not include juvenile arrest history as a predictor (0.53 and 0.64).
SI-8 cohort. The calibration plots include individuals predicted to be in the top quartile for risk of arrest. The top row shows the calibration of the classic risk-factor logistic regression model with performance for the older cohort on the left and performance for the younger cohort on the right. The bottom row shows the performance of full lasso logistic regression model. The deviation of the regression line from the diagonal in the right column indicates that cohort bias is still observed when limiting the calibration analysis to only the highest risk individuals, as defined by predicted arrest probability.

Fig. S6.
Calibration plots for a classic risk-factor logistic regression model that includes changing neighborhood characteristics as predictors. These calibration plots show the performance of a logistic regression model trained on the older cohort using the classic risk-factor feature set as well as neighborhood census characteristics collected at ages 9-,13-, and 17-years old. These features do not adequately capture the larger social changes creating cohort bias as their inclusion in the model does not substantially change the degree of cohort bias observed. The top row shows the calibration of the classic risk-factor logistic regression model with performance for the older cohort on the left and performance for the younger cohort on the right. The bottom row is similar but shows the performance of the full lasso logistic regression model. Both models were trained on the older cohort to predict whether an individual would be arrested on a non-drug charge. While it is possible that the cohort bias observed in earlier formulations was the result of shifting policies, particularly related to highly discretionary charges such as drug offenses, excluding this discretionary arrest category has little impact on the degree of cohort bias, indicating that changes in drug enforcement practices are not the primary driver of cohort bias.